Semantic segmentation research has recently witnessed rapid progress, butmany leading methods are unable to identify object instances. In this paper, wepresent Multi-task Network Cascades for instance-aware semantic segmentation.Our model consists of three networks, respectively differentiating instances,estimating masks, and categorizing objects. These networks form a cascadedstructure, and are designed to share their convolutional features. We developan algorithm for the nontrivial end-to-end training of this causal, cascadedstructure. Our solution is a clean, single-step training framework and can begeneralized to cascades that have more stages. We demonstrate state-of-the-artinstance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, ourmethod takes only 360ms testing an image using VGG-16, which is two orders ofmagnitude faster than previous systems for this challenging problem. As a byproduct, our method also achieves compelling object detection results whichsurpass the competitive Fast/Faster R-CNN systems. The method described in this paper is the foundation of our submissions tothe MS COCO 2015 segmentation competition, where we won the 1st place.
展开▼